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Examination, Undergraduate Assessment Program of the College Entrance 
Examination Board, American College Testing (ACT) Program achievement 
tests, ACT College Outcome Measures Proje'tt, College Level 
Examination Program, critical thinking and higher level outcome 
measures, basic skills); (2) academic-motivational outcomes; and (3) 
academic-behavioral outcomes (career/life goal exploration, 
diversity, persistence, faculty-student relationships). Appended are 
ch'^ts showing the categories of the NCHEMS Outcome Structures, and 
lists of basic skills tests for college students. Seventy references 
are included. (LB) 
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Introduction 



The purpose of this working paper is (1) to 
identify and describe some current methods 
and instruments for assessing college student 
academic outcomes. (2) to suggest possible out- 
come measures of NCRIPTAL's research program, 
and (3) to suggest methods of outcome assess- 
ment for other researchers and practitioners. In 
the process of this exploration we review existing 
literature on outcome assessment and other 
related research literature on college outcomes to 
set the stage for a discussion of appropriate 
methods for recognizing improved teaching and 
learning. We also examine typologies and frame- 
works for understanding outcomes developed by 
several scholars. 

The literature on college outcomes and their 
measurement is evolving rapidly. We acknowl- 
edge a substantial debt to several scholars, 
particularly Alexander Astln. Howard R Bowen. 
Peter Ewell, C. Robert Pace. Ernest Pascarella. 
and others, from whose earlier reviews we have 
selected and summarized material liberally. 
Along with these scholars and other organiza- 
tions, such as the American Association of Higher 
Education, which has gathered a substantial 
collection of literature on outcome assessment. 
NCRIPTAL desires to be of service to educators 
who seek a broad, nontechnical suinmaiy of this 
emerging field. 

Undoubtedly, there Is new and important lit- 
erature that we have overlooked or that is still in 
press. It is particularly difficult to know of valu- 
able work in progress at individual campuses but 
we believe that this general summary will encour- 
age practitioners and researchers to Inform us of 
their sucv.esses as well as the difficulties they 
encounter. Therefore, this paper Is a working 
document to be updated periodically as NCRIP- 
TAL learns of the work of educators who have 
developed nev/ measures and new techniques. 

This paper is a working document in another 
sense as well. It has been developed during the 
first few months of our existence as a national 
center to conduct tesearch and provide leadership 



in improving postsecondary teaching and learn- 
ing. Concurrently, several NCRIPTAL researchers 
are developing reviews of outcome measures that 
can be used to assess specific aspects of student 
academic development. Because of their concur- 
rent development and the technical nature of 
these related reviews, they are mentioned only 
briefly in this overview. A list of the concurrently 
developed papers follows the title page. 

As NCRIPTAL's work proceeds over the next 
several years, an important goal is to understand 
connections among more specific areas of re- 
search on student growth. Thus, future synthe- 
ses wl31 describe and develop more completely the 
relationships among potential measures of stu- 
dent outcomes within the NCRIPTAL typology 
described later in this paper. 

The paper is organized in the following 
maimer Section I defines college outcomes from 
several perspectives and discusses the Impor- 
tance of outcome assessment at the postsecon- 
dary level Pressure fi'om the acaderoic commu- 
nity and frr n federal and state agencies has 
increased interest among educators in assessing 
student achievement. The resulting interest has 
brought into focus a number of issues about 
outcome assessment, including choices among 
appropriate models for outcome assessment. 
NCRIPTAL's mission In Improving postsecondary 
teaching and learning and its relation to outcome 
assessment is introduced. 

Section n reviews various approaches to 
outcome assessment as well as existing typologies 
for classifying outcomes. The recent works of 
Pace. Astln. Bowen. E^vell. Lenning. and others 
are discussed. A typology tentatively adopted by 
NCRIPTAL researchers is Introduced and delimits 
the discussions of outcomes in Section III. 

Section III presents a number of common 
outcome measures as potential measures both for 
NCRIPTAL and for the research community and 
educators in general. Areas of the NCRIPTAL 
typology in which outcome measures are under- 
developed are noted. 
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L Outcomes and Outcome Assessment 



Before engaging the issues of outcome assess- 
ment, a review of the definitions of outcomes 
that have been used by researchers is in order. 
In their groundbreaking review. Feldman and 
Newcomb (1969) refer to the impact of college on 
students rather than to outcomes. They view 
Impact as the influence of colleges on student 
orientations and characteristics. Bo wen (1977) 
takes an economic approach to outcomes, defiii- 
ing them as the result of transfomied institutional 
resources. The primary product of the transfor- 
mation is individual learning; additional products 
include changes in other intangible individual 
qualities. 

Pace (1979) defines outcomes as changes 
that are widely accepted as goals of higher *^duca- 
tlon and that arc the result of events and t^perl- 
ences In coU^e designed to help students attain 
these goals. Astln (1980) uses a value-added 
approach to outcomes. He specifies that out- 
comes are the measured dlflferences between 
entry characteristics of a student and the charac- 
teristics of a student on exit from college. Ewell's 
(1983) definition of outcomes coincides with 
Astln's. He defines outcomes as any change or 
consequence that occurs as a result of enrollment 
in an educational institution and participation in 
its programs. 

The major dlfTerenccs among these defini- 
tions relate to whether they address the question 
of what the outcomes are or the question of why 
outcomes occur. Pace (1979) claims that 
Feldman and Newcomb. through the use of the 
term impact are attempting to explain the causes 
of certain outcomes. The question that Feldman 
and Newcomb address, therefore, is why certain 
changes occur. In contrast. Pace prefers simply 
to address the issue of change. He claims that by 
measuring change the Vhat" question being 
addressed is: What are the outcomes of college? 
This question, he believes, is much simpler to 
address and provides primary evidence of the 
results of a coU^e education. 

For the purpose of NCRIPTAL's research 
program for improving teaching and learning, 
both the "what" and the "why" questions are 
Important. This paper focuses oa detenninlng 
what changes occur; other NCRIPTAL literature 
reviews focus on various reasons for the out- 
comes. This paper also addresses the question of 
which measures may be most appropriate for 
measuring the outcomes of postsecondary learn- 
ing. 
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The outcome definition most closely suited lo 
NCRIPTAL's model is Astin's definlUon of out- 
comes. NCRIPTAI^s use of a value-added, change 
model will assist in the discovery of the effects of 
various Instructional, programmatic, and individ- 
ual characteristics on the teaching and learning 
process. The framework NCRIPTAL has adopted 
to focus its work is presented in Figure 1 (see 
page 7). 

Current Pressures for Outcome Assessment 

Currently there are pressures in postsecon- 
dary education from two directions for outcome 
assessment— from the academic community itself 
and from employers of college graduates. First, 
the academic community is calling on postsecon- 
dary institutions to use assessment as a means of 
improving the quality of education. A number of 
books and articles have pointed to an apparent 
lack of quality control in collegiate v*iducation and 
have suggested measurement of student progress 
as one means of rectifying the situation. The 
National Institute of Education's Involvement in 
Learning (NIE Study Group. 1984) calls for the 
systematic assessment of students* knowledge, 
capacities, and skills as a way of addressing 
problems in the undergraduate curriculum. In To 
Reclaim a Legacy (1984), William Bennett also 
called for currlcular refonn. minimum standards, 
and assessment as a way of standardizing the 
meaning of the undergraduate degree. Integrity 
in the College Curriculum (Association of Ameri- 
can Colleges. 1983) posits that the absence of 
Institutional accountability is a grave problem 
and that the measurement of student progress 
poses a solution to the dilemma. Access to 
Quality Undergraduate Education (Southern 
Regional Education Board. 1985) calls for a 
cooperative effort within the educational commu- 
nity to find ways of Improving quality while 
maintaining access. Tht report suggests that new 
ways of measurlrg student progress and perform- 
ance are needed to resolve the access/quality 
problem. 

An Emphasis on Quality 

A key issue raised in all of these reports is 
quality. For years the worth of a college educa- 
tion went unquestioned: it was typically assumed 
that college graduates left college with an in- 
creased amount of knowledge and understanding. 
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Today, however, public attitudes have changed. A 
number of Indicators point to a decrease in the 
quality of college education and this has resulted 
in a call for accountability for improved teaching 
andlerxning. Among the cited indicators are: (1) 
a large number of students who need remedial 
courses at the college level. (2) a decline in stu- 
dent scores on verbal sections of standardized 
tests. (3) a decline in graduate scores on stan- 
dardized tests and professional licensing exams, 
and (4) an increased number of students pursu- 
ing professional and occupational studies rather 
than a liberal arts education (Hartle. 1985). 
Despite disagreements over appropriateness of 
these indicators, many believe the quality prob- 
lem is real. 

These reports have pointed to assessment as 
a solution to the probiem. The common themes 
in these reports include a need for stronger 
student performance, clearer expectations of what 
college students should learn, and more rigorous 
measurement of educational achievement. 

The expressions of concern about educa- 
tional quality are not unique to the educational/ 
academic community. Spokespersons from 
private industry, government, and accreditation 
agencies have called for increased institutional 
accountability as well (Ewell. 1985). The private 
sector, as the major employer of college gradu- 
ates, desires more uniformJy high quSity in the 
graduates it hires. In addition, state legislators 
are becoming increasingly concerned with the 
return states are getting from their investment in 
higher education. These constituencies represent 
the second pressure calling for postsecondary 
assessment. 

Issues that Hinder Assessment 

Despite this increased pressure on institu- 
tions to evaluate and assess student outcomes, 
very few colleges actually have established stu- 
dent assessment programs (Ewell. 1985). This 
limited response may be due to a number of 
concerns and problems revolving around student 
assessment. 

One of the most difficult issues surrounding 
outcome assessment is the question of what the 
outcomes of college should be. At the secondary 
level, the assessment of basic skills provides a 
common]" accepted level of achievement. At the 
college le' \ however, there is no common base 
level of higher academic skills that is universally 
accepted As Tumbull (1985) stated. 

Beyond basic skills there lies an immense realm of 
disagreement about collegiate f^oals....It is essential 
to lealize that the purposes of higher education are 
a matter of fundamental debate, (p. 24) 



ITiis lack of consensus exists within coll<^ges as 
well as between colleges. Hartle (1985) recognizes 
the problem witiiiii iiisUtutions. 

The central problem is that measuring educational 
achievement may well require more agreement 
about the ends and means of a higher education 
than we have at most institutions, (p. 15) 

The development of consensus on minimum 
requirements is crucial for developing a success- 
ful outcome assessm-jnt program within a given 
institution. Consensu 3 among iiistitutions wouid 
be even more difficult to attain and might result 
only in agreement on very minimal outcomes. 

A common set of stated outcomes for all in- 
stitutions would make it difficult to take into 
account the broad range of institutional goals and 
missions. To be useful and effective, assessment 
programs must address the diversity of institu- 
tional goals. In a society with a diversified and 
decentralized system of postsecondary education, 
assessment programs must be tailored to fit the 
needs of the various institutions. 

In his discussion of accountability. Bowen 
(1974) recognizes the diverse goals of institutions 
and calls for a matching of assessment programs 
to college goals. He proposes that to attain true 
institutional accountability an institution must 
(1) define goals and order priorities. (2) measure 
and identify outcomes. (3) compare the outcomes 
with the goals to determine the degree to which 
goals have been met. and (4) measure the cost 
and determine whether it is reasonable (p. 9). 

Once the question of consensus and out- 
comes has been resolved, the method of measure- 
ment arises as an issue. Numerous measures are 
aval) able for measuring both cognitive and affec- 
tive outcomes among students. Deciding on the 
appropriate measures is difTlcult. Harris (1985) 
offers guidelines for practitioners getting started 
with an assessment program in higher education. 

Assessing improved teaching and learning is 
also somewhat hindered because of political 
problems it can pcse for faculty and administra- 
tors. Ewell (1983) identifies a number of con- 
cerns that cause administrators to avoid assess- 
ment. First of all. they fear that no positive 
impact will be found and that results will reflect 
badly on their leadership. Ewell believes this fear 
is unfounded, given a number of successful 
attempts in finding positive outcomes. Adminis- 
trators are also concerned about the misinterpre- 
tation of quantitative results. Although seme 
outcomes may be qualitative in nature, the 
measures used most typically are quantitative 
descriptors. Administrator*^ and faculty fear that 
people may place too muc'i weight on the num- 
bers, losiiig sight of the fact that the measures 
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are only proxies for the actual outcomes. In 
addition, the false precision of quanUtatlve meas- 
ures may further comph ate the public interpre- 
tativ^n of the results. These fears among educa- 
tors have hindered the development of student 
outcome assessment programs. 

Purposes of Outcome Assessment 

Evtn if agreement could be reached on what 
the outcomes of college should be. assessment 
holds varied meanings for various p xrties. Ac- 
cording to Hartle (1985). the term assessment, in 
education, is often used interchangeably with 
evaluation and measurement. Yet. there are 
subtle differences among these terms. Assess- 
ment is the process of gathering data (measure- 
ment) and assembling evidence into an interpret- 
able form for some intcndea use. Once the 
Information is gathered. Judgments (evaluations) 
may be made based on the evidence. Measure- 
ment, therefore, is only a part of the assessment 
process, and is not necessarily synonymous with 
assessment, whereas evaluation implies Judg- 
ments based on the collated measures. 

In addition to the problem of defining assess- 
ment, individuals tend to associate assessment 
with nuE ^rous and diverse activities. Hartle 
(1985) mentions six separate and non-parallel but 
overlapping actMtles that may be believed to 
represent assessment activities in higher educa- 
tion. One activity uses multiple measures and 
observers to monitor students* intellectual and 
personal growth. Another assessment activity 
may be associated with state-mandated require- 
ments for evaluating student progress or aca- 
demic program success. A third assessment 
activity is the value-added method for measuring 
student progress, which invoh/es pre- and post- 
testing and attribution of gains to the college 
experience. A fourth activity Involves the use of 
standardized testing to measure the extent of 
student knowledge. A fifth assessment activity 
uses assessment in fund allocation, frequently by 
rewarding institutions based on performance 
criteria. Finally, measurement of changes in 
student attitudes and values is a sixth activity 
considered part of the assessment domain. 

Locus of Assessment 

In addition to the issues surrounding agree- 
ment on assessment outcomes and activities, 
issues arise over the level of analysis at which as- 
sessment occurs. Data gathering for assessment 
programs can occur at one or more of three levels: 
the individual student, the academic or depart- 
ment program, and the institution. The useful- 
ness of assessment at each level depends on the 
purpose of the investigation. Some individuals 
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have argued that assessment should focus on the 
individual learner (Hartle, 1985: Bowen. 1979! 
Others view the academic department as the 
appropriate level of analysis for investigating 
teaching and learning environments (Winteler. 
1981). At a higher level of aggregation, measiucs 
have been developed to assess Institutional 
outcomes. Pascarella (1985). however, asserts 
that Institutional difierences are much more 
difllcult to pinpoint and that differences in out- 
comes are more effectively explained through 
individual characteristics. 

One final important issue concerning out- 
come assessment is whether the process is inter- 
nally or extemally administered. While some 
institutions have established their own programs 
(e.g.. Alvemo College), others have received state 
mandates to use standardized testing to assess 
the quality of education (e.g.. Tennessee and 
Florida). In general, academic institutions tend to 
feel threatened by extemal control and fear loss of 
autonomy. Some assessment proponents Indicate 
that these fears should Inspire instituiions to 
initiate intemally directed programs before state 
level initiatives are realized. Institutional assess- 
ment programs may have needed flexibility to 
address most appropriately the particular goals of 
the institution. 

There is strong opinion among some authori- 
ties that the pressure for educational assessment 
will not dissipate. For example, basing his con- 
clusion on several factors. Hartle (1985) warns 
that assessment is not a passing fad. First, now 
that the issue of student access has been re- 
solved, the focus on quality assurance is essen- 
tial. Second, there is a widespread public con- 
cern over the lack of value placed on teaching in 
some colleges. Finally, state governments are 
well-informed and interested in the qualiey of 
education. 

Such predictions, and the abundance of 
existing definitions and interpretations about 
them, indicate a continuing need to specify at 
least eight parameters for effective discussion 
about assessment of student outcomes. 

1. What are the purposes or incentives for as- 
sessment? Assessment activities to satisfy 
state mandates may differ substantially from 
those undertaken to improve learning within 
an institution. 

2. The type of assessment being discussed is 
important. For example, is information on 
student values, student academic achlev^e- 
ment. or the employment rate of new gradu- 
ates to be gathered? Is outcome information 
to be gathered only on those outcomes upon 
which some group has achieved consensus 
or. with the possibility of improving under- 
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Standing or consensus, should it also be 
gathered in aieas about which little a^icc- 
ment exists? 

3. At what level is the assessment to occur? 
For example, is it important to learn about 
progress of Indlvldusd students or groups of 
stud'' its. about st\»dents in specific aca- 
demic programs, about the programs them- 
selves, or about the Institution as an aggre- 
gate of students and programs? 

4. What will be the form of assessment? For 
example, is the assessment to focus on 
measures of student change (the Value- 
added" approach) or to determine whether 
students have reached some expected level of 
achievement? Can these two forms be 
effectively combined? Will multiple meas- 
ures or a single measure of each outcome be 
used? 

5. what agency will be responsible for adr mini- 
stration of an assessment program? Possi- 
bilities range from external groups such as 
state agencies and legislatures to groups of 
faculty in specific academic programs. 

6. If evaluation is to follow assessment, what 
will be the ocus of evaluation? That is. once 
the tnfcrmation has been gathered in an 
Interpretable form, who will make evaluative 
judgments about what types of change or 
stability are suggested? 

7. What will be the locus of decisions about the 
appropriate use of the information or about 
any evaluative judgments that are made? 

8. What will be the use of the evalu ♦ive judg- 
ments made on the basis of assessment ac- 
tivities? For example. If judgments are 
made, will they be about the merit or worth 
of students, the merit or worth of programs, 
the merit or worth of institutions? Or will 

hey be focused on specific recommendations 
for improvement of student learning, of 
program quality, or institutional functioning? 

These eight parameters may be interrelated 
m many ways: certainly decisions about each will 
influence decisions about the others. Many 
obser\Trs would agree, however, that the most 
c oicial linkage is that betvw^een the purpose of 
pssessment and use of data gathered in the 
assessment process. Another logical linkage 
(perhaps less clear) may exist between the level a* 
which assessment is undertaken and the level of 
administration. Leaving other linkages unspeci- 
fied as unique to a ^iven situation, we have 



illustrated the importance of these relationships 
by arranging; the eight parameters as shown in 
Figure 1. 



Use ^ ► Purpose 

Locus of Admi nist^atjon ^ ^ Level of Assessment 

Locus of Deasions Type of Assessment 
about Change 

Locus of Evalua'^. . Form of Assessment 

Pohcy Parameters Technical Parameters 

Figure 1. Parameters of Outcome Assessment 



One Implication of the arrangement in Figure 
1 is that the paran^oters on the Irft. those having 
to do with loci of assessment and subsequent 
evalu: tions and decisions are basically policy 
issues. Those on the right, describing type, form 
and level of assessment, are laigely technical 
Issues. Both sets strongly depend on the pur- 
poses and planned uses of assessment. In the 
following section, we describe briefly NCRIPTAL's 
mission. Our purpose in doing so is to show how 
our efforts assume a specific Institutional purpose 
for assessment activities, namely. Improvement in 
the teaching and learning environment. We 
expect, therefore, that institutions will also be 
responsible for using information they gather 
through use of outcome measures. Conse- 
quently, the type, level, and form of outcome 
measure? selected, developed, and used In 
NCRIPTAL's research agenda v/Hl be those most 
suitable for use by Institutions Interested in 
improving teaching and learning. 

NCRIPTAL*8 Research Mission and Model 

The National Center lor Research to Improve 
Postsecondary Learning and Teaching will focus 
its research, development, and dissemination 
activities on i>''e aspects of college learning envi- 
ronments that alTect learner outcomes: class- 
room learning ano teaching strategies, currlcular 
structure and integration, faculty attitudes and 
teaching behaviors, organizational practices, and 
the use of emerging information technology. 
While recognizing multiple student outcomes of 
college, such as cognitive development, personal 
development, and career development, the Center 
initially will emphasize cognitive development of 
undergraduate students in colleges that concen- 
trate on teaching as their primary mission. This 
emphasis was chosen because the recent dra- 
matic progress of research in cognition holds 
great promise for improving leaming and teach- 
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ing. Furthermore, student cognitive development 
is intimately linked to career development and to 
other important outcomes such as the develop- 
ment of a sense of self-efficacy, personal responsi- 
bility, and motivation. 

Student's cognitive and affective characteris- 
tics, which vary with their diverse backgrounds, 
are important conditioners as well as predictors of 
learning experiences. Since leamers of many 
backgrounds and ages now attend college and 
since instructors may select an increasing vartety 
of potentially effective strategies, the Center will 
attempt to discover optimum combinations of 
learner characteristics and instructional proc- 
esses to facilitate cognitive development. 

To complete this mission, a research frame- 
work for the NCRIPTAL's work has been devel- 
oped. Considered simply, this model (see Figure 
2) includes three general research variables: stu- 
dent ch£U"acteristlcs (independent variables), 
teaching/learning environments (alterable vari- 
ables), and student outcomes (dependent vari- 
ables). Student characteristics are motives, 
learning styles, prior knowledge, skills, and other 
charactertstics that students bring with them to 
college. These characteristics interact with 
Institutional environments to determine learning. 

The teaching/ learning environments £ire in- 



fluenced by the faculty, the curriculum, the 
teaching and learning strategies, the institutional 
practices, and by the technological environment 
Although extracurricular and mterpersonal 
factors also influence the institutional environ- 
ment, these variables will not be included m 
NCRIPTAL's research agenda Additionally, the 
environmental factors interact and Cs-erlap, and 
these interactions will be recognized in our re- 
search agenda. 

Student outcomes here are defined as the 
results of students* involvement in teaching/ 
learning environments. These outcomes may be 
both long-term (measurable throughout life after 
completion of college) and short-term (measurable 
during or immediately following the college experi- 
ence), but NCRIPTAL will focus on the short-term 
outcomes as most directly applicable to improving 
teaching and learning. Although briefly charac- 
terized in the usual manner as "independent" 
variables, student characteristics and glials may 
also change as a result of the college experience. 
Thus, a feedback loop is included in the Center 
model indicating that education is an iterative 
process whereby the typically independent vari- 
ables are affected over time by the teaching/ 
learning environments. 




Figure 2. Variables in NCRIPTAL's Research Agenda 
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II. Approaches to Outcomes and Outcome Assessment 



A number of researchers have attempted to 
classify outcomes and specify approaches for 
assessing outcomes. In this section, approaches 
by Ewell. Astin. Lenning et al., and Bowen are 
discussed. 

Ewe 11 (1983) discusses three approaches that 
have been used to measure student outcomes: 
academic investigation perspective, student- 
personnel perspective, and management perspec- 
tive. Actually these sipproaches are based on the 
purpose of the investigators and thus use differ- 
ent perspectives on outcomes, have different goals 
for using outcomes, and involve different data re- 
quirements. 

Academic investigation (research) is the 
oldest and most commonly used reason for meas- 
uring student outcomes. The college experience is 
investigated in a typical research fashion: theo- 
ries about student growth are developed, tested, 
and refined as a result of data collection. From 
this perspective, most of the research on student 
outcomes has been done by psychologists and 
sociologists. Frequently psychologists have fo- 
cused on the impact of college on personal and 
cognitive development and sociologists have 
concentrated on such issues as the Impact of 
college on social mobility and socialization of 
students into the professional fields. In this 
perspective the goal to explain (and ultimately to 
predict) human behavior and the data collected 
must have high empirical quality and be objective. 
While some of the i"elationships discovered in this 
research have been used by institutional poli- 
cymakers, it should be noted that decisional 
utility is not the goal: the purpose is to success- 
fully account for a given outcome. 

The student personnel approach uses stu- 
dent outcomes as a means for evaluating students 
for admission to programs and placement on com- 
pletion of the program. The data are also used for 
counselling students in career selection and for 
evaluating the effectiveness of programs for meet- 
ing student needs. In this perspective the goal of 
outcome measurement is to gain assessment 
informatior about individual students. Data is 
considered useful if it provides Information for 
student placement or if it is diagnostic of student 
problems. The theoretical constraints of data 
collection are not crucial when using this ap- 
proach. 

The management perspective for measuring 
outcomes is a still different approach to outcome 
assessment. From this perspective the focus is on 
the use of outcome assessment as a method to 



Improve administrative decisions, particularly 
those involving program planning and budgeting 
The goal of outcome assessment in this perspec- 
tive is to Improve the quality of resource-ailoca- 
tlon decisions. To meet this goal, data must be 
empirically valid, reliable, and perceived by the 
decision makers as relevant to the decision. 

Ewell's classification of approaches to stu- 
dent outcomes is useful because it calls attention 
to varied uses of outcomes and the ways in which 
different goals influence the collection of student 
outcome information. 

In addition to classifying approaches to 
outcome assessment based on proposed uses, 
researchers have attempted to classify types of 
educational outcomes. Astin (1974) developed a 
taxonomy of student outcomes involving three 
dimensions: type of outcome, type of data, and 
time. The types of outcome are split into two 
domains: cognitive and affective. The cognitive 
domain includes outcomes such as basic sIlUIs, 
general intelligence, and higher-order cognitive 
processes. The affective domain includes out- 
comes often described as attitudes, values, and 
self- concept. 

The data dimension is also split into two do- 
mains: behavioral and psychological. This di- 
mension distinguishes between outcome data that 
are covert and those that are observable. The be- 
havioral domain refers to observable activities of 
the individual. The psychological domain refers to 
the internal states or traits of the individual. 
While the actual outcomes may be the same, the 
ways in which the information is gathered to 
represent them are different. 

The primary two dimensions of Astin *s ap- 
proach are shown in Table 1 . This typology has 
been widely accepted as a method for classifying 
outcomes. In Astin*s typology the third dimen- 
sion, time, stresses the importance of including 
both the long- and short-term outcomes of college. 
Some examples of applying the time dimension to 
the outcome cells are provided in Table 2. 

In addition to the typology. Astin (1974) 
provided some insights into the assessment of 
educational outcomes. To him the fundamental 
purpose of assessment is to produce information 
that is useful for decision making. Thus meas- 
urement should begin with a value statement- an 
idea about what future state would be desirable or 
important. 

Lenning and Associates (1983) at the Na- 
tional Center for Higher Education Management 
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TABLE 1 

A Tv^ Jlonv of Studsnt OutcorriKM 



DATA 



Psychologtcaf 



Behavioral 



OUTCOME 



Affec&ve 



Cognitive 



Self-concept 
Values 
Attitudes 
Beliefs 

Dnve for Achievement 
Satisfaction with College 

Personal Habits 
Avocations 
^lental Health 
Citizenship 

i^(terpersonal Relations 



Knowledge 

Cndcal Thinking Ability 
Basic Skills 
Special Aptitudes 
Academic Achievement 



Career Development 

Level of Educational Attainment 

Vocational Achievements 

Level of Responsibility 

Income 

Awards or Special Recognition 



Sourct: Alexander W Astin. R J. Pano«. and J A Creager. National Norms for Entenng College Freshmen - Fall 1966 
(Wasnington. D C : American CouncH on Education, 1967): p 16. 



TABLE 2 

Outcomes Over Tim» 



OUTCOME 



DATA 



SHORT-TERM 
INDICATOR 



LONG-TERM 
INDICATOR 



Affective 

Affecd^'e 

Cognitive 
Cognitive 



Behavioral 

Psychological 

Behavioral 
Psychological 



Choice of major 
field of study 

Sadsfacdon with 
college 

Persistance 

LSAT score 



Current Occupation 

Job Sadsfacdon 

Job Stability 

Score on law boards 



Source: Astin. 1974, p 33 



Systems (NCHEMS) developed an exteT .j- 
work for identifying the universe of r / r ^ t- 
puts" and outcomes of postsecondar> ; tutlons. 
In developing this taxonomy, the authors sought 
to develop an exhaustive list of outcomes to assist 
in the assessment of managerial effectiveness. As 
a result of the management perspective, Lenning 
et al. Id not focus exclusively on student out- 
comes but rather included them in two of the 
several categories: human characteristics out- 
comes and knowledge, technology, and art forms 
outcomes. Viewed in Astin's terms, the human 
characteristics outcomes include primarily aflec- 
tive and personality characteristics, as well as 
skill outcomes. The knowledge, technology, and 
art form category includes the typically cognitive 
outcomes: both specialized and general knowl- 
edge and scholarship. Additional outcome catego- 
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lies in this framework include (1) economic (e.g., 
economic security, standard of IMng), (2) resource 
and service provision (e.g., teaching, facility 
provisions), and (3) other maintenance and 
change (e.g., traditions, organizational operation). 
A listing of the complete NCHEMS taxonomy is 
included in Appendix A Clearly this framework 
includes both long- and short-range student 
outcomes as well as outcomes at the program and 
institutional level. 

Bowen (1974) took a slightly different ap- 
proach from the two previous researchers when 
discussing outcomes. Instruction is related to the 
outcome of learning and changes in human traits. 
Research and scholarship relates to the outcomes 
of preservation, discoverj% and interpretation of 
knowledge, artistic and social criticism, philo- 
sophical reflection, and advancement of the fine 
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arts. Public service results in societal ouicomes 
such as improved health, solutions to social 
problems and agricultural productivity (p. 2-3). 

Of these three services, Bowen believes that 
instruction is the primary goal of higher education 
and bringing about desired changes in students is 
central to this mission. Bowen's approach could 
be viewed, therefore, as primarily academic in 
nature. He focused on investigating the changes 
that occur among students without emphasizing 
the use of these measures in either placement or 
decision making. 

In a later work. Bowen (1977) broadened his 
view of student leanx'-ng and offered a more elabo- 
rate catalogue of accepted goals, lliis catalogue 
of goals serves also as a typology of student out- 
comes derived from three widely accepted goals of 
instruction. These three general goals arc: edu- 
v^dtlng the whole person, addressing the individu- 
ality of students, and maintaining accessibiLt>' 
The first goal, educating the whole person, refers 
to the idea that education should cultivate both 
the int**' ectual and affective dispositions of 
persons, thereby enhancing Intellectual, moral, 
and emotional growth. The second goal, address- 
ing individuality, requires that the uniqueness of 
individuals be taken into account in the educa- 
tional process. Accessibility refers to the notion 
that education should be readily available to a 
broad range of persons. 

According to Bowen. the catalogue of goals 
derived from these general goals constitutes both 
a model for the educational system and the 
criteria by which the sjrstem can be Judged. While 
Bowen recognized that his goal typology has 
Utopian qualities, he posits that it provides a 
useful model that can be used to shape and guide 
Institutional functioning. 

In Bowen's scheme specific educational goals 
are divided into two groups: goals for individual 
students and goals for society. The five categories 
of goals for individual students include: cognitive 
learning, emotional and moral development, 
practical competence, direct satisfactions from 
college education, and avoidance of negative 
outcomes. In a further subdivision, cognitive 
learning includes ten specific areas of learning. 
They are: 

1. Verbal skills: Ability to read, speak, and 
write clearly and correctly. 

2. Quantitative skills: Understanding of 
mathematical and statistical concepts. 

3. Substantive knowledge: Acquaintance with 
Westem culture and traditions and familiar- 
ity with other cultures. Knowledge of con- 
temporary philosophy, art. literature, natu- 
ral science, and social issues. Understand- 
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ing of facts, principles and vo abulan- 
within at least one selected field 

4 Rationality Ability to think logically, and 
analytically, and to see facts clcarlv and ob- 
jectively. 

5. Intellectual tolerance: Openness to new 
ideas, curj^siiy, and ability to deal with 
ambiguity and complexity. 

6. Esthetic sensibility: Knowledge of and inter- 
est in literature, the irts and natural 
beauty. 

7. Creativeness: Ability to think imaginatively 
and originally. 

8. Intellectual integrity: Respect fo / and 
understanding of the contingent nature of 
truth. 

9. Wisdom: Ability to balance perspective. 
Judgment and prudence. 

10. Lifelong learning: Sustained interest in 
learning. (Bowen. 1977. pp. 35-36) 

Bowen's remaining four categories of student 
goals are focused prtmarlly on affective and long- 
term student outcomes. 

Bowen also suggested seven principles hat 
should be used in the identification of outcomes 
and thus In outcome assessment at particular 
colleges. The first principle is that Inputs should 
not be confused with outputs. Bowen claims that 
high Institutional expenditures (an Input) do not 
guarantee equivalently high outcomes: the differ- 
ences between inputs and outputs has too often 
been ignored. The only valid outcome measure- 
ment Is of the development and changes that 
occur in students as a result of their college 
experience. 

The second principle suggests that assess- 
ment should be linked to all educational goals, 
not Just to those developments easily measured or 
related to economic success. Bowen offers his 
catalogue of goals as a starting point on v/hich to 
build an assessment plan. 

The third principle states simply that educa- 
tional outcomes should relate to the person as a 
whole; and the fourth principle posits that out- 
come assessment should include the study of 
alumni as well as current students. The fifth 
principle suggests that outcome assessment 
should measure changes that occur as a result of 
the college experience. 

The sixth principle states that an evaluation 
scheme must be practical: not too time-consum- 
ing or expensive. The assessment should focus 
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on major goals of the Institution and need not be 
based on the entire population of students. How- 
ever, results must be reported in a form that the 
general public can read and understand. 

The final principle asserts that assessment 
should be controlled from within the institution 
rather than being Imposed by exten.al agencies. 
Assessment programs should be designed for 
each institution, keeping the special missions and 
philosophies of the Institutions in mind. 

Ewell (1983) mentions additional outcome di- 
mensions that should be considered. These 
include whether (1) the effects are short- or long- 
term. (2) the student Is aware or unaware of the 
outcome. (3) the effect is direct or indirect (i.e.. 
how closely the outcome is connected to the edu- 
cational program), and (4) the outcome is in- 
tended or unintended. Tliese dimensions repre- 
sent important differences between outcomes that 
should be considered in outcome research and 
assessment. 

In more recent work than that reviewed 
earlier. Astln (1979) identifies three core measures 
of student outcomes that should be included in a 
student outcome data base. First, students' 
successful completion of a program of study 
should be included. More specifically, informa- 
tion is needed to determine whether stuaents' ac- 
complishments are consistent with their original 
goals. Second, a measure of cognitive develop- 
ment must be included and more than grade 
point average and class standing are needed. 
Preferably, repeated measurement will be used so 
that change can be assessed by comparing per- 
fomiance at two points in time. Third, measures 
of student satisfaction should include satisfaction 
with the quality of the curriculum, teaching, 
student services, facilities, and other aspects of 
the college. 

Beyond these essential measures, the stu- 
dent data should include information gathered on 
entry, during the educational process, and at exit 
or another designated point of time. Student 
characteristics should be recorded when they first 
enroll, information on what happens to the stu- 
dent while enrolled at the college must be avail- 
able, and measures of the degree of attainment of 
desired or behavioral objective at exit must also 
be accessible. This approach, developed by Astln. 
is known as the Value-added" approach. It 
asserts that outcome measures alone tell us very 
little about institutional effectiveness or impact. 
By controlling for entry characteristics, however, a 
more accurate picture of outcomes will emerge. 
In the absence of such data, outcome measures 
rrav be grossly misinterpreted when used for 
assessing institutional effectiveness because most 
outcomes are highly dependent on the character- 
istics of students at entry. 



NCR£PTAL*8 Delimited Outcome Framework 

As discussed earlier. NCRIPTAL's mission in- 
cludes both conducting basic research on the 
effects of various aspects of the teaching and 
learning environment on student outcomes and 
providing leadership and assistance to institu- 
tions in their own assessment and evaluation 
efforts. Thus, in the terms of EwelVs "perspec- 
tives." we must engage in a dual approach, com- 
bining the academlc-investlgat^/^e spirit of basic 
research and a management perspective that can 
help institutions construct their own assessment 
processes and uses of the information. 

Fulfilling this dual mission with available re- 
sources requires delimitation of the arena in 
which our work vUl be conducted and a selection 
of outcome measures and assessment principles 
that seem most closely related to practical con- 
cerns in Improving teaching and learning. Exist- 
ing typologies, such as that proposed by Astln (see 
Tables 1 and 2). the list of principles by Bowen. 
and the Important distinctions mentioned by 
E^vell. as well as the work of many other scholars, 
have been helpful in formulating our plans. In 
Table 3 we have summarized some of these 
propositions, attempting to group them as accu- 
rately as possible under the "technical parameter" 
headings discussed earlier, namely, "type of 
outcome to be measured," "level ot measurement," 
and "fonn of measurement." This grouping forms 
the basis for our discussion of outcome measures 
to be used in NCRIPTAL's work. It bears repeating 
that only these three parameters of type, level, 
and fomi are discussed because we have already 
focused our work on a specific purpose (improve- 
ment of teaching and learning) and assume that 
results will be used for decisions consonant with 
that purpose. Furthermore, our efforts are based 
on the assumption that the administrative locus 
of assessment activities and evaluative decisions 
about this Information all rest within the college 
or university. 

Type of Outcome Measures 

However desirable it might be for researchers 
and institutions to follow Bowen's suggestion to 
assess all possible outcomes and relate outcomes 
to the development of the whole person, such a 
global program would readlily encounter problems 
of feasibility and lack of consensus. Nonetheless, 
our discussion of outcome measures begins with 
the whole-person approach in an effort to deter- 
mine which subsets of this universe are of great- 
est importance. 

During such discussions we found many 
benefits, but some pitfalls, in Astin's encompass- 
ing four-fold typology of student outcomes (see 
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TABLR 3 

Propositions and Caveats about Type, Level, and Form of Outcome Measurement 



Type of Outcome Measures 



Level of Outcome Measures 



Form of Outcome Measures 



BOWEN 



Assess all outcomes, 
even those difficult 
to measure 

Relate outcomes to 
whole person 



Focus on changes 
attnbutable to 
college 

Focus on major 
insttutonai goals 

Study alumni as well 
as current students 



Separate inputs and 
ou^uts 



Use practical and 
feasible means 



EWELL 



Distinguish intended 
and unintended 
outcomes 

Distinguish outcomes 
of which student is 
aware and unaware 

Dtstinguisf; outcomes 
closely linked to 
educational program 



Distin£"jish short- and 
long-term outcome 
measures 



ASTIN 



Record whether 
students completed 
program and whether 
accomplishments were 
consistent with 
their goals 

Measure at various 
points in time, 
include information 
at entry, dunng 
program, and on exit 

Use measures of 
cognitive develop* 
ment beyond grade 
point average 

Indude measures of 
student satisfaction 



Table 1). Specifically, although Astln acknowl- 
edged Interactions between affective and cognitive 
outcomes, his typology used these concepts as 
two different primary dimensions. Consequently, 
the typology made little provision for attention to 
cognitive-personal outcomes or affective-academic 
outcomes. Yet, many cognitive psychologists and 
personality theorists believe that, particularly for 
students who enter college with undeveloped 
motivation or low seff-efllcacy, affective outcomes 
may be related to academic as well as to personal 
and social growth. As a result of these and 
related discussions, we drew a slightly different 
type of typology framework which notes three 
"arenas* of student growth in college and three 
forms through which changes in these arenas 
may be observed. The resulting nine-cell frame- 
work, which we stress was derived a priori from 
our accumulated experience, is shown in Table 4. 

The arena dimension refers to the various 
aspects of life in which the outcome is import ar;t. 



The three arenas are personal, social, and aca- 
demic. The personal domain includes outcomes 
like personal worth, feelings about oneself, satis- 
faction with personal accomplishments, ability to 
m£ike decisions, and using one's skills appropri- 
ately. The social arena outcomes include ability 
to function in interpersonal relationships, citizen- 
ship, social responsibility, social awareness, and 

TABLE 4 

A Whole-Person Approach to College Student Outcomes 



FORM OF 
DEMONS! RATED 
CHANGE 



ARENAS OF GROV\n"H AND DEVELOPMENT 



Sooal 



Personal 



AcadeaNc 



Cognitive 



Motivational 



Behavtc a I 
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contributions to society. The academic arena 

motivation, critical-thinking abilities, problem- 
solving skills, and goal exploration b'^havlors. 

The form dimension also has three catego- 
ries: cognitive, motivational, and behavioral. 
This dimension specifies the for^ in which the 
outcome is demonstrated. Cognitive outcomes 
are internal outcomes. Typically they occur 
within Individuals' mental processes and their 
existence is inferred, usually through testing. 
Motivational outcomes consist largely of the 
feelings that individuals have about themselves, 
their capabilities, and the world around them. 
These outcomes are generally self-reported, 
though some social-psychological methods exist 
that tap these attitudes more discretely. Behav- 
ioral outcomes may be reported by the individual 
or directly observed. 

As meiiuoned earlier, NCRIPTAL's research 
program will focus on the academic arena shown 
in Table 4. In selecting this subset of the universe 
of college outcome measures for attention, we risk 
posing for others the same difficulty that Astin's 
typology posed for us. We acknowledge that the 
personal and social arenas cannot be separated 
from the academic arena; one's personal and 
social development affects one's academic devel- 
opment and the reverse is also true. Nonetheless, 
by constructing a framework that includes three 
cells, academic-cognitive, academic -motivational, 
and academic-behavioral, we are able to encom- 
pass a broad set of outcomes of primary concern 
to colleges and the public as well as to incorporate 
recent theories of cognitive development. Table 5 
shows a more detailed view of the academic aiena 
and the types of outcomes that seem to fit into 
each of the three major cells. 

At first glance, some observers will believe we 
have violated Bowen's principle of separating 
inputs and outputs by classifying as outcomes 
some of those items listed in the academic-moti- 
vational cell. Traditionally, motivation, self- 
efficacy, involvement, and effort have been viewed 
as fixed attributes students bring to the educa- 
tional process. Our view that these characteris- 
tics are ^iubject to change {in an intended or 
uninter ded direction) as a result of the educa- 
tional process is, in part, what caused us to 
modify previously existing outcome typologies. 
Although little attention has been given to these 
ideas, most colleges would agree, for example, 
that improved motivation is an outcome to be 
sought. WTiile the original motivation a student 
brings to college is an input, a new motivational 
level based on educational experiences becomes 
an outcome the student takes to the next stage of 
learning. 

An additional pre\'lously neglecied aspect of 
the iterative outcomes conception relates to 



TABLE 5 

NCRIPTAUs Outcome Framework 

FORM OF ACADEMIC ARENA 

MEASUREMENT 



Cognitr e Achievement (facts, principles, ideas, skills) 

Critical-thinking skills 
Problem-solving skills 

Moti/ationaf Satisfaction with college 

Involvement/effort 
Motivation 
Self-efficacy 

Behavioral Career and life goal exploration 

Exploration of diversity 
Persistence 

Relationships with faculty 



Eweirs distinction between student awareness or 
lack of awai eness of changes. Although we have 
not included it in the list at this time, if students 
are to take increased responsibility for their 
learning, awareness itself may be an outcome to 
be sought. 

Level of Outcome Measures 

As already mentioned, both practicality and 
technical diificulties have caused us to set aside 
Bowen's suggestion that alumni be stud' in 
addition to current students. Instead, 
NCRIPTAL's agenda v;^ focus on outcome meas- 
ures that can be related directly to classroom and 
program educational experiences. In general, our 
unit of analysis will be the individual student and 
groups of students sharing a common educational 
experience in a cours** or program. Whenever 
possible, outcome measuies for special popula- 
tions of students (e.g., minorities, women, adult 
students) wil^ ^ ^ examined in relation to similar 
data for traditional students. 

Astin's point about whether students' even- 
tual accomplishments are consistent with their 
goals will be a special focus of one of our research 
programs. In fact, goals of students at college 
entry are subject to change in both intended and 
unintended directions. Slnc^ there would likely 
be disagreement about what constitutes positive 
change, we have included an academic-behavioral 
outcome called "career and life goal development." 
The implication is that the student should gain in 
ability to explore, consider, and make decisions 
about eventual goals. 

Form of C tcome Measures 

For many institutions, there may be an in- 
herent conflict between observing Bowen's caveat 
about feasibility of measurement and adopting 
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Astin's value-added approach, which statistically 
controls for student enlr>- characteristics when 
obser/ing changes in student outcomes over time. 
Tills is particularly irue u measures of cognuive 
development, such as reasoning skills ana critical 
thinking, are used to supplement more .radltional 
measures of academic achievement. Ir develop- 
ing new measures and in assisting institutions 
with the use of already developed measures. 
NCRIPTAL will attempt to help simplify the appro- 
prlate use of outcome measures. 

The next section of this paper describes 
some of the academic measures already in use by 
colleges and alerts the reader to some new meas- 
ures that NCRIPTAL staff hope to make available 
for future use. 
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I. Outcome Measures and Outcome Research 



A3 discussed In the previous sections, out 
come measurement recently has received in- 
creased emphasis at the college level. Numerous 
measures are available to assess learning in 
college. It is difficult for educators to choose 
among the widely diverse types of measures. In 
this section, some of the available measures will 
be reviewed. Reliability and validity information 
are included when available and scholarly re- 
search that has been conducted using the meas- 
ures Is reported. The purpose is to describe the 
utility of the instrument for measuring improved 
learning and teaching by examining the 
measure's properties and the results it has pro- 
duced as a college outcome measure. New meas- 
ures and new reports on their uses are appearing 
daily. This review should be considered back- 
ground for future updates. 

The measures will be divided Lnto three 
sections consistent wltli the cells In NCRIPTAL's 
typology: academic-cognitive outcomes, 
academic-motivational outcomes, and the 
academic-behavioral outcomes. 

Academic-Cognlti'^e Outcomes 

The following available measures axe 
reviewed here: 

Graduate Record E^mination 
American College Testing Program Achievement 
Tests 

Undergraduate Achievement Program of thr 
College Entrance Examination Board 

The American College Testing Program College 
Outcome Measurement Program (COM?) 

College Level Examination Program 

National Teacher Examination 

Measures of critical thinking 

Measures of basic skills 

Graduate Record Exam 

The Graduate Record Exam, produced by the 
Edumtional Testing Senrice. was initially intro- 
duced as a general achievement test to measure 
knowledge in three general categories: f ocial 
studies, natural sciences, and humanities. These 
tests were different from the typical achievement 
tests of that era because the items were meant to 
evaluate students' ability to read, understand, 
and interpret knowledge rather than to test 
Simply their possession of knowledge. Though 
the items were somewhat content- imbedded, the 



imdal area tests of the GREs were developed to 
test ability to generalize from information that 
was given (Pace, 1979). 

The general tesc of the GREs is a test of developed 
verbal, quantitative, and analytical abilities that 
have been acquired by students over time. The 
GRE geneial test Is offered to college seniors and 
graduates and is used by some graduate schools 
for admission decisions, fellowship awards, and 
prediction of an applicant's success in graduate 
school. 

Educational Testing Service also produces 
twenty GRE advanced subject tests that measure 
knowledge specific to certain fields. Tiiese tests 
are intended for college seniors and are fairly 
comprehensive. The;>e scores also are used for 
admission criteria in some graduate schools. 

The K-R 20 reliability coefficients for verbal 
and quantitative exceed .90 and for the analytical 
section are ,86 (Cohn, 1985). Though these 
values are h^ghty respectable, Jaegar (1985) 
wains that an internal consistency measure sucn 
as the K-R 20 may actually overestimate the 
reliability of the general sections. 

The validity of the GRE general tests is 
somewhat questionable since there is little evi- 
dence to support the predictive power of the tests 
for graduate school achievement. The validity co- 
efficients for predicting first-year grade-point 
averages for the three sections of the general tests 
are around .20 and .30 (Cohn, 1985). It should 
be noted* however, that though these correlations 
are only moderate, the sample of students is 
limited to those who have been accepted into a 
graduate program. Thus both the range of scores 
and the number of students included in the 
sample are small. 

In combination with undergraduate grade 
point average, the predictive validities of the GRE 
general test for graduate school success range 
from .32 to .56 (Jaegar, 1985). Because ETS 
encourages the use of GRE scores in conjunction 
with other admission criteria (e.g.. G.Pj\., letters 
of recommendation), this combined validity 
Justifies its use. 

Numerous researchers have used the GREs 
as a measure of differences in teaching and 
learning at both the institutional and individual 
levels Many of these studies have been reviewed 
by Pascarella (1985) and we have drawn freely 
from that review. 

At the individual level. Nichols (1964) at- 
tempted to assess the effects of different colleges 
on the GRE verbal and Quantitative scores of 
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Students. Nichols examined structural and 
organizational characteristics of colleges (private 

library books per student) and the environmental 
characteristics (using Astin*s 1963 Environmental 
Assessment Technique). The college su-uctural 
characteristics were not significantly correlated 
with the GRE scores but three aspects of the EAT 
were significantly correlated with the verbal and 
quantitative scores. The amount of variance 
accounted for by these EAT variables, however, 
was small in comparison to the amount of vari- 
ance accounted for by the students* entry charac- 
teristics. 

Astin ( 1968) examined variation on the hu- 
manities, natural science, and social science GRE 
tests as a function of traditional indices of institu- 
tional quality. These indices included intelligence 
of student body, financial resources, libraiy size, 
and student-faculty ratio. Astin foimd. however, 
that all of the partial correlations indicating a 
relationship between institutional characteristics 
and GRE scores became trivial after accounting 
for numerous student entiy characteristics (e.g.. 
aspirations, high school achievement, family 
backg.X)und). Apparently, student characteristics 
arc more predictive of GRE area scores than 
Institutional characteristics. This finding indi- 
cates that changes in learning may not be attrib- 
uted to institutional characteristic^, but perhaps 
must be examined at a lower programmatic level. 

In further analyses of the same data, Astin 
and Panos (1969) found that institutional charac- 
teristics other than traditional quality indices ex- 
plained some GRE variance. These characteris- 
tics included institutions where students made 
frequent use of automobiles and where students 
were undecided about their careers. Also. GRE 
scores were higher at institutions where there 
was a generally flexible curriculum, where there 
was a technical emphasis, and where there was a 
laiige enrollment. These panial correlations, 
however, were also quite small, indicating that 
pre-enrollment characteristics may be more 
meaningful than institutional characteristics for 
determining college outcomes. 

At the institutional level. Rock. Centra, and 
Unn (1970) and Centra and Rock (1971) at- 
tempted to explain the relationship between 
college characteristics and student learning. 
Their dependent measure was residual scores on 
the three area tests of the GREs. the humanities, 
social sciences, and natural sciences. To obtain 
these residuals, the authors regressed the average 
institutional GRE score on the average SAT score 
which yielded predicted GRE scoies for each 
Institution. The predicted scores were then 



subtracted from the actual scores which pro- 
duced the residual score. The proportion of 
students majoring in the various areas was albu 
taken into account. 

Rock. Centra, and Linn (1970) examined the 
influence of institutional characteristics typically 
associated with quality on GRE residual scores. 
Only two of the factoid, the income a college 
receives per student and the proportion of faculty 
with a doctorate, were consistently related to 
colleges with high residual achievement. 

Centra and Rock (1971) focused on the dif- 
ferences between environmental characteristics of 
colleges and their potential influence on learning. 
The five factors they used, which were derived 
from the Questionnaire on Student and College 
Characteristics, were faculty-student interaction, 
curriculum flexibility, cultural facilities, student 
activism, and degree of academic challenge. 
Centra and Rock found a positive relationship 
between achievement and faculty-student interac- 
tion, curricular flexibility and availability of 
cultural activities. 

In his review of large-scale unpublished 
surveys. Pace (1979) discusses two studies that 
examine academic achievement during college 
using the GREs as the outcome measure. An ETS 
compilation of 3.035 scores of seniors from 
various colleges showed that students who ma- 
jored in one of three subareas (social science, 
natural science, and humanities) scored higher 
on that section the the GRE area tests than 
students who majored in another area. Pace 
states that the results simply attest to the fact 
that •'students know most what they study most" 
(p. 25). 

The second study Pace reviewed involved the 
advanced tests of the GREs. Harvey and 
Lannholm (1960) tested 300 uppeiclassmen at 29 
institutions both before the students had taken 
any upper level division courses and again at the 
end of their senior year. Students who had 
majored .psychology, economics, or chemistiy 
were included in the sample. The difTerences in 
scores were typically close to a standard deviation 
higher afl:er having taken the upper level courses. 
This evidence supports the contention that stu- 
dents learn from studying in a specific field. 

These studies seem to reveal that students 
learn in college and *'ie more they study in a 
certain field, the more they learn in that field. 
The GRE tests appear to be a reasonable measure 
for examining learning at the college level. While 
few studies have conducted pre- and post-tests of 
college students* learning using the GREs. this 
option appears to have potential for a useful 
measure of differential teachmg and learning. 
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American College Testing Program 

TV»A iooir* A,rl,, 

catlonal development In the areas of mathematics 
usage. English usage, social studies reading, and 
natural sciences reading. The ACT Assessment 
Program also includes an Interest Inventory and a 
Student Profile. The program was developed in 
1954 as a college admissions test and as a tool for 
guidance and counseling of freshman In college. 
The program originally grew out of the Iowa Tests 
of Educational Development but has since then 
become independent. 

Five educational development scores are 
reported after the ACT tests have been taken. 
Four individual scores are rep-^rted for the four 
subsections and a composite score representing 
an average of the four sections is also Included. 

The content validity of the tests is acceptable 
and reasonable according to two reviewers (Aiken. 
1985; Klfer. 1985). Predictive validity for the ACT 
tests is also quite high. The validity ranges from 
.4 to .5 with college freshman grade- point 
average (Aiken. 1985). However, when high 
school grade point average is already included in 
the regression equation, the inclusion of ACT 
scores improves the predictive valiJity by only 
.10. Kifer (1985) quesUcns the value of the effort 
invested to obtain this limited increase in predic- 
tive power. 

Another issue mentioned by numerous 
reviewers is the amount of overlap within the four 
secUons of the ACT (Hill. 1978; Klfer. 1985; 
Aiken. 1985). There is agreement among these 
critics that too much emphasis is placed on 
reading in the various sections of the test, llie 
intercorrelations of the four sections range from 
.53 to .68. indicating that similar abilities are 
being tested. 

The reported reliabilities for the various sub- 
sections are as follo\i^: English usage. .92; 
mathematics usage. .91; social studies reading. 
.88; and natural sciences reading. .88 (Aiken. 
1985). These reliabilities have Improved over the 
years and are adequate at these levels for individ- 
ual decisions based on test scores. 

The ACT test scores have been used by re- 
searchers to examine learning and cognitive de- 
velopment at the college level. Lenning. Munday, 
and Maxey (1969) examined cognitive growth in 
the first two years of college in five institutions 
using tests of the American College Testing 
Program. Samples of students were chosen from 
two state colleges, on^ liberal arts college, a junior 
college, and a state university. ACT tests were 
administered at the beginning of the freshman 
year and again at the end of the sophomore year. 
Differences between pre- and post- test scores 
were significant on all the composite scores for all 



groups except the female sample at one institu- 
tion. Students made the greatest gains in social 
studies and natural sciences and sonicwliat 
lesser gains in English and mathematics. 

Dumont and Troelstrup (1981) also used the 
ACT tests as a measure of cognitive gains in 
college. They pre-tested students at one institu- 
tion at the beginning of their freshrnan year and 
again four years later. Students made significant 
gains in all areas, showing even more of an 
increase in ACT subscores than the sophomores 
in Lenning. Munday. and Maxey's sample. 

The use of pre- and post-tests for students 
highlights the actual learning that occurs in 
college. The ACT tests seem capable of tapping 
the cognitive development of students in general 
education areas. 

Undergraduate Assessment Program: 
Area Tests and Field Tests 

The Undergraduate Assessment Program 
(UAP) is closely related to the ORE. The UAP area 
tests are fundamentally the same as the ORE 
area tests and are similar to the GRE advanced 
tests, however, the Business field test is the only 
fleld test still available (Pace. 1979). The area 
tests are divided into three sections: humanities, 
social sciences, and natural sciences. 

At part of the standardization of the UAP 
test. 47.000 seniors from 211 colleges were given 
the area tests. Some of the colleges also admini- 
stered the tests to other classes. Despite the 
problems of comparing cross-sectional data, 
especially when the population of colleges was so 
heterogeneous and the sample sizes so different, 
the results showed that, within the three major 
domains, seniors and juniors scored higher than 
sophomores and freshmen (Pace. 1979). 

Further results from ETS (1976) indicate 
that when the UAP test is in the student's area of 
interest (i.e.. humanities, social sciences, or 
natural sciences), the scores are substantially 
higher than the scores of the total group of stu- 
dents from that institution (including the 'interest' 
group in the total). Seniors within their area of 
Interest scored higher than sophomores having 
the same academic interest (ETS. 1976). 

ETS revised the area tests in 1978 and 
published a new guide (ETS. 1978). This new 
guide reports results similar to the previous one 
Seniors who majored in humanities, social sci- 
ence, or natural science scored considerably 
higher than sophomores and freshman. The 
mean scores on the three area tests ii. .rease witli 
each year of college as well (remember that this is 
a cross-sectional data set). Thus, the evidence 
reported by ETS leads one to believe that the 
more one concentrates on a particular field, the 
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more one leams in that field. Students are 
learning in college and the UAP tests seem able to 
capture some aspects of stu :ents* cognitive 
development. 

ACT College Outcome Measures Project 

The American College Testing Program has 
developed a unique achievement test called the 
College Outcome Measures Program. This project 
repx^sents a new direction in achievement testing 
(Pace. 1979). Rather than testing general knowl- 
edge and specific content from an academic 
discipline, the COMP Is an effort to measure 
students* ability to apply facts, concepts, and 
skills to real world activities. Specifically, the 
content of the Measurement Battery involves 
three areas related to adult functioning: func- 
tioning within social institutions, using science 
and technology, and using the arts. Three types 
of competencies are measured within these three 
areas: communicating, solving social problems, 
and clarifying social values. 

The COMP is unusual in its format and in 
the materials used for testing. Unlike the typical 
multiple-choice, paper and pencil format, the 
COMP materials include film excerpts, taped 
newscasts, art prints, magazine and newspaper 
articles, and other realistic materials that might 
be encountered in life. An actual item cn the 
COMP might require a recorded or a written 
response: the student may have to write a per- 
suasive memo and Justify a decision in speech. 
The idea is to make the test as realistic and as 
relevant to real life as possible. 

The COMP. therefore, takes considerable 
time to administer as well as to evaluate and 
score. Taking the test requires approximately six 
hours and rating the responses takes about one 
hour. Standardized rating scales have been 
developed to aid Judges in their evaluation proc- 
ess (Forrest & Steele. 1978). 

In addition to the six-lxour battery, the 
overall COMP includes two additional tests: an 
Activity Inventory and an Objective Test. The 
Objective Test is an effort to maintain the advan- 
tages of the Measurement Battery while decreas- 
ing the time factor. This test uses the same 
stimuli as the Measurement Battery but the 
student is given four options from which to 
choose, two of which are considered good an- 
swers. This test can be machine scored and test- 
taking time is reduced to two and one-half hours. 

The Actl\ity Inventoiy measures the amount 
of experience that an individual has had in the six 
areas that are covered in the Measurement Bat- 
tery. The score is meant to supplement either the 
Objective Test or the Measurement Battery with 
an experience factor. 



College Level Examination Program 

The College Level Exa*nination Program 
(CLEF) is a program originally developed forgiving 
college credit to students by examination. CLEP 
now offers two types of examinations: the Gen- 
eral Examinations and the Subject Examinations. 
The examinations were developed to assess the 
knowledge of students who have acquired knowl- 
edge outside tlie classroom. The General Elxams 
measure college-level achievement in English 
composition, humanities, mathematics, natural 
sciences, and social sciences and history. The 
test is for general education requirements and 
covers material typically studied in the first two 
years of college. The Subject Examinations are 
more advanced and require specific knowledge in 
a particular field. Only the General Exams will be 
discussed here. 

Reliability coefficients for the General Exams 
arc quite high, on the order of .90 and above 
(Aleamonl. 1985). Validity of the exams, however, 
is somewhat in question. The primary validity 
Infomiatlon comes from data used for norming 
the tests. These data showed that the more 
courses taken in an area (i.e.. humanities, history 
and social sciences, science, and math), the 
higher the score on the test (Pace. 1979). These 
validity tests were based on a national sample of 
college sophomores (N = 2600). The manual from 
which information was made available, however, 
did not include the correlations between number 
of courses taken and test scores. The fact that a 
relationship exists is fairly weak evidence for 
validity. Aleamonl (1985) states that many 
colleges have had to develop their own validation 
studies in order to make a case for the appropri- 
ateness of the exams. 

National Teacher Examination 

The National Teacher Examination initially 
produced by the Educational Testing Service was 
a test designed for college seniors and teachers 
and provided a standardized measure of academic 
preparation in three areas: general education, 
professional education (for teachers), and subject- 
fleid preparation. The purpose of the test was 
threefold: it could be used to assist colleges in 
revlevdng their programs and policies: state 
departments could use the scores for teacher 
certification purposes as well as for attaining 
profiles of prospective teachers* knowledge and 
skills; and school administrators couid use the 
scores as a standardized measure for evaluating 
the competencies of prospective teachers. ETS. 
hoviwer. warned agamst the use of these exams 
as a sole determinant of graduation, certification, 
and selection decisions. Colleges are wamed in 
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the exam booklet against using absolute cut-olT 
scores for any type of decision (Merwln. 1978). 

exams: the Common E^xamination and the Area 
E:xaminations. Only the Common E^xamination 
will be discussed here. The Common Examina- 
tion covers general education and professional 
education: the professional education section 
consists of 1 10 items, the general education 
sections consist of 45 items on Written English 
Expression. 65 items on Social Studies. Litera- 
ture, and the Fine Arts, and 50 items on Science 
and Mathematics. These sections are generally 
seen as including knowledge a well-rounded 
educator should possess. 

Because of the increasing concem with 
teacher ccrtiflcatlon and the emerging needs of in- 
stitutions, ETS expanded the National Teacher 
Examinations just described into the National 
Teacher Ebcamination Program. This new pro- 
gram is composed of three sections: thePre- 
Professional Skills Tests (FPSTl. the NTE Core 
Battery, and the NTE Specialty Area Tests. 

The PPST is an assessment of basic skills 
initially developed for use in colleges to determine 
whether a student had the basic skills necessary 
to enter a teacher training program. There has 
been an increase, however, in the use of the PPST 
as an initial teacher certification test. The PPST 
measures sklDs required for the beginning 
teacher. The battery consists of three tests: 
Communications Skills. General Knowledge, and 
Professional Knowledge. The Communications 
Skills test covers listening, reading, and writing 
abilities. The General Knowledge test covers 
mathematics, science, social studies, literature, 
and fine arts. The Professional Knowledge test 
covers knowledge and skills needed for developing 
instructional plans and their implementation as 
well as the professional behavior required of 
teachers. The NTE Specialty Area tests are 
content specific tests available in 28 areas. 

Recent research by Ayres and Bennett (1983) 
and Ayres (1983) used the old NTE as an outcome 
variable to assess diflferences in learning across 
institutions and individuals. The authors judged 
the NTE to be a reasonable measure of learning 
usually expected during general undergraduate 
study. Using the Institution as a unit of analysis 
(N = 15) Ayres and Bennett (1983) explained 88% 
of the variance in NTE scores by including in the 
regression equation average institutional SAT 
score, average number of courses taken in general 
education, average faculty salary, average educa- 
tional attainment of faculty, institutional age, 
library size, and institutional size. The average 
educational attainment of faculty members 
accounted for the largest percentage of the vari- 
ance. 



Using the same data, but examinlnf^ it with 
the student as the unit of analysis (N = 2,229). 
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composition of the institution on NTE scores. 
Ayres found that when controlling for aptitude 
(SAT score), black students in a primarily \vhite 
institution performed better than black students 
in a predominantly black Institution. 

These succes^ul attempts at explaining 
variation on ttie NTE provide evidence for its 
usefulness as a measure of some aspects of 
college achievement. The evidence presented by 
ETS (in Merwln. 1978) suggest, on the other 
hand, that the predictive validity of the NTE limits 
its usefulness as a selection or certification tool in 
education. As an academic outcome measure of 
the effectiveness of teaching and learning, how- 
ever, it rxir oe useful and appropriate. 

Critical Thinking and Higher Level 
Outcome Measures 

A number of tests have been developed to 
measure the higher level cognitive processes of 
college students. These cognitive processes 
include critical thinking, complex reasoning and 
judgment, abstract thinking, and flexibility of 
thought. In this section, some research findings 
are bri^^fly reviewed and information available on 
the tests used are reported. TTie measures used 
in this research, however, are ofi:en not standard- 
ized and are developed by researchers for their 
particular interests. Neither can we claim to have 
exhausted the extensive and growing literature in 
this area. Within the NCRIPTAL work. McKeachie 
and colleagues present a more complete review of 
such measures. 

Evidence supports the notion that critical 
thinking abilities Improve over the college years. 
Lehmaim (1963) used the American Council of 
Education's Test of Critical Thinking in a longitu- 
dinal study of students at Michigari State Univer- 
sity. The ACE test taps five dimensions of critical 
thinking: (1) defining a problem. (2) selecting 
Information relevant to the problem. 93) recogniz- 
ing stated and unstated assumptions. (4) formu- 
lating and selecting relevant hypotheses, and (5) 
drawing valid conclusions. Lehmann tested 
1.051 students on entering college and again at 
the end of the freshman year and every subse- 
quent year. All students tested had significantly 
higher scores as seniors than they did as fresh- 
man. The most significant gains occurred during 
the freshman year. 

Keeley, Brown, and Kreutzer (1982) used across- 
sectional design and administered open-ended 
and essay measures of critical thinking to 145 
freshman and 155 seniors at a large state univer- 



21 

26 



Focusing on Student Academic Outcomes 



sity. Two experimental conditions were en.ployed. 
with half of each class cohort receiving each 
treatment. In one coridition. students were given 
verv ceneral instnictions on how to res'^ond to 
the items. The other group received specific 
instructions for writing critical evaluations of the 
items. 

In the general instruction condition, seniors 
had significantly higher scores in six of the seven 
criticism categories on which they were Judged. 
These categories were general criticisms, under- 
standing of structure, logical inconsistency, 
explicit criticism, and essay length. In the spe- 
cific Instruction condition, seniors were more 
skilled at identifying the controversy and conclu- 
sions of the essay, and identifying^ assumptions. 
The seniors also received higher overall scores in 
this condition. 

Another critical thinking measure, the 
Watson-Glascr Critical Thinking Appraisal (Wat- 
son & Glaser. 1964). has been used by a number 
of researchers and may be the most widely used 
critical thinking instrument. This Instrument is 
designed to measure three dimensions of critical 
thinking: inference, deduction, and recognition. 
Recognition refers to the ability to recognize 
unstated assumptions. Inference refe^-s to the 
ability to distinguish between valid and iii ralld 
inferences drawn from data. Deduction refers to 
the ability to reasdi deductively, from the general 
to the specific. 

Mentkowskl and Strait (1983) studied the 
development of critical thlnhlng skills du^g the 
college years using the V/atson-Glaser. This 
research was part of a comprehensive outcome 
evaluation pr^rom that Alvemo College uses to 
examine cognitive development of students. 

The design was longitudinal, with more than 
700 freshman tested at the beginning of their 
freshman year and again at the end of the sopho- 
more and senior years. Significant increases in 
scores on all three dimensions of the Watson- 
Glaser were found between sophomore and senior 
years. Significant increases on the inference and 
deduction scales were found between the fresh- 
man and sophomore years. 

An additional assessment tool used in 
Alvemo's Evaluation Program is a Piagetian 
formal reasoning task. The task essentially 
measures the student's ability to reason 
abstractly: there were two proportionality prob- 
lems, two conservation of volume problems and 
one problem dealing with the separation of vari- 
ables. 

Mentkowskl and Strait (1983) found signifi- 
cant Increases in formal reasoning between 
freshman and sophomore year. A similar in- 
crease in formal reasoning ability was also found 
by Eisert and Tomlinson-Keasey (1978) in a study 



of 55 freshman. They found significant increases 
betw -^n the start anu end of the freshman year. 
. other measure available is the Test of 

This broad essay exam measures thinking and 
reasoning ability. In the test, students are given 
two different groups of Thematic Apperception 
Test stories ana are asked to describe the differ- 
ences between the two groups in an essay. The 
essays are Judged on nine reasoning and thinking 
criteria. 

Winter and McClelland (i978) conducted a 
multiple Institution study of cognitive develop- 
ment using the Test of Thematic Analysis. 
Samples were drawn from three institutions: an 
elite libeml arts college, a state teachers college, 
and a coniuiunity coDege. Longitudinal and 
cross-sectional data were collected from the 
liberal arts college: the design for the teachers 
c liege and community college was cross-sec- 
tional. 

Longitudinal data from 80 students at the 
liberal arts college showed sigziiflcant diff"erences 
from freshman to senior year on the Thematic 
Analysis score. The cross-sectional designs 
resulted in significant findings only at the elite 
liberal arts college. Data from the teachers 
college and community college did not show 
significantly reliable increases. 

Winter. McClelland, and Stewart (1981) 
sought to measure intellectual flexibility in rea- 
soning using Stewart's Analysis of Alignment Test 
(1977), This test confronts a subject with a 
controversial statement and asks the subject to 
write two essays: one defending the statement 
and one attacking it. The two essays are scored 
on ten criteria. These criteria are meant to 
measure the extent to which the subject can 
evaluate an argument and construct a coherent 
evaluation of an argument. 

The study involved the same samples as 
those in Winter and McClelland's 1978 study. 
Stat^sUcally significant differences were found 
between freshman and fins.^ ^-ear students at all 
three institutions on total score of the Analysis of 
Argument Test. 

The reflective Judgment interview (RJI) is 
another '^allable measure of higher level cogni- 
tive skills. Reflective Judgment refers to the 
development of complex reasoning and Judgment 
skills. In the interview, the subject is confronted 
with four contiovers*al dilemmas and a sei of 
standardized questions designed to tap level of 
reasoning. Le\el of reasoning is detenmined 
bared on a Perry-like scheme of intellectual 
development (1970). Schmidt and Davison (1981) 
classify level of reasoning along a multilevel 
continuum, ranging from dualism to probabilism. 
Dualism is a simple and illogical reasoning pat- 
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tern and probabillsm is reasoning based on 
evidence and logic. 

Brabcck (1983) provides a review of ten 
studies that have used the RJi as a depenoeni 
valuable. Many of these bludles used a cross- 
sectional desi^ to measure differences in reflec- 
tive Judgment across educational levels. The 
results support the idea of a reflective Judgment 
continuum, with more advanced students show- 
ing higher levels of reasoning ability. Postsecon- 
dary education docs have an influence on the de- 
velopment of reasoning ability. 

NCRIFIAL researchers. McKeachle. Plntrlch. 
Lin. and Smith provide a detailed and more 
technical review of measures of two other aca- 
demic-cognitive outcomes, problem- solving skills, 
and knowledge representation. Their lnvesti:\,u- 
tlon will be incorporated into this background 
paper In its next revision. Readers are referred to 
the list of NCRIPTAL technical reports noted at 
the beginning of tht report. 

In sum. many higher level cognitive proc- 
esses appear to improve in collie. Mar^ meas- 
ures are available for evaluating improvement in 
these areas. In this area, most measures are 
developed speclfkally for a given research pro- 
gram though they may be useful for other pur- 
poses. 

Basic Skais 

Altti^ugh basic skills are generally not 
viewed as an outcome of college, they are becom- 
ing an increasingly Important part of the college 
curriculum. As access to coUqge for disadvan- 
taged students has Improved, so has the concern 
for keeping those students in college and helping 
them to succeed through the use of remedial and 
basic skill programs. 

Concern for basic skills programs at the 
four-year college level has grown out of fairly 
recent changes in the student population. Two- 
year community and Junior colleges have always 
been concerned with basic skills as both a re- 
quirement and an outcome of their programs. 

For the purposes of evaluating outcomes as 
well as the maintenance of skills In both two- and 
four-year colleges, basic skills measures are 
considered here as an outcome measure that 
some colleges may wish to incorporate or use as a 
pre-test at entry. 

Basic skills usually refer to fundamental 
abilities in mathematics. English composition, 
reading, and vocabulary. While most programs 
for identifying basic skills at the college level are 
developed at the institutions, there are tests 
available for evaluating students in this area. 
Appendix B presents information on a number of 
reading, math, and vocabulary tests designed for 



the college level, While not an Inclusive list, it 
represents some available tosts of basic skills. 

Wolfe (1983) studied the development of 
basic skills m college. He mvestigated the pro- 
gression of students in vocabulary and mathe- 
matical ability using 1979 follow-up data from the 
NaUonal Longitudinal Study of the High School 
Class of 1972. He found that, when controlling 
for ethnicity, parent's education, father's occupa- 
tion and the 1972 scores, postsecondaiy educa- 
tion significantly Improved both vocabulary and 
mathematics performance. 

Aoulemic-Motivational Outcomes 

Measures covered in this section Include 
those related to: motivation. seK-efflcacy. student 
involvement, and satisfaction with college. 

Two motivational outcomes, motivation and 
self-efiicacy, are discussed in detail in a compan- 
ion report by McKeachle et al. Another compem- 
ioii paper by Kom discusses self-efflcacy as both 
an input and outcome variable in college. Rather 
than repeat their efforts, we refer the reader 
interested in these outcomes to the two other 
reviews. 

Involvement and Quality of Effort 

The relatively new concepts of hivolvement 
and quality of eflbrt have both motivational and 
behavioral components. We have chosen to 
include them as motivational c itcomes because 
of the ties between these concepts and motiva- 
tion. 

Astin's (1984) theory of student Involvement 
can be slmpty summarized as "Students le.am by 
becoming involved" (Astin. 1985. p. 133). Astin 
defines involvement as the amount of physical 
and psychological energy that the student devotes 
to the acaderric experience. He considers involve- 
ment as closely linked to motivation but prefers 
the term involvement because it has more behav- 
ioral implications. 

Involvement theory has five postulates. 
First, involvement consists of investing energy 
into objects. Second. Involvement exists along a 
continuum. Third, involvement has quantitative 
and qualitative aspects. Fourth, the amount of 
student learning and personal development is 
related to the amount of student invo^-ement. 
Fifth and finally, the effectiveness of an institu- 
tional policy or practice is related to the capacity 
it has to increase student involvement. 

Asiin (1977) conducted a large-scale longitu- 
dinal study investigating the effects of various 
forms of involvement on numerous student 
outcomes His general conclusion was that most 
fonns of iiivolvement lead to greater than average 
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changes in the entry characteristics of freshmen. 
In some instances. Involvement was more strongly 
related to outcomes than either entry or institu- 
tional ciiarac .eristics. 

Most of the outcomes Astin measured were 
motivational outcomes. More research is needed 
to determine the eflfects of Involvement on cogni- 
tive outcomes. No obvious instrument is yet 
available to measure student involvement. 

Pace (1984) discusses the concept of quality 
of effort as an Important determinant of student 
outcomes. He posits that because education is 
both a process and product, the quality of the 
educational experience must be taken into ac- 
count. The quality of this process is not the sole 
responsibility of the insUtuUon or its faculty 
members, rather students must take some re- 
sponsibility for their own progress by taking 
advantage of opportunities provided to them by 
the institution. Tnus, by measuring quality of 
student effort one can better assess the quality of 
the educational process. 

Pace veloped an instrunienc to measure 
ihe qua f student cq)eriences b^ deterailning 
the extent to which students take part in activi- 
ties and opportunities intended to promote stu- 
dent leaning and development. Pace's Quality of 
College Student Ebq)eriences (1984) standardized 
self-report survey includes fourteen scales of 
activities (e.g., student union, athletic and recrea- 
tional facilities, experiences in writing, library 
experiences) which reflect increasing amounts of 
effort and potential value. The scored responses 
provide a measure of the quality of effort students 
have Invested in the various aspects of college life. 

The survey also collects information on 
college environment, student background infor- 
mation, and gains made during college. Pace 
suggests that the instrument be used by institu- 
tions for program evaluation, resource allocation, 
and faculty and staflf discussions. The instru- 
ment can also be used as a research tool for in- 
vestigating the relationship between quality of 
student eflfort and institutional characteristics. 

Pace (1984) reports reliability and validity in- 
formation in the manual. For re\iews of the in- 
SiTumeut and conmients on the psychometric 
features reported in the manual see Miller (1985) 
and Brown (1985). 

Pace also reports data from studies using the 
Quality of College Student Experiences Survey. 
Results support the idea that quality of effort is 
an important predicvor of student achievement; 
effort measures slgrJflcantly increase the amount 
of explained variance in student achievement 
(Pace. 1984). The increase in explanatory power 
occurs when student characteristics, college 
status variables, and college envirormient ratings 
have already been included in the regression 
equation. 



The concepts of quality of effort and involve- 
ment are often considered process variables; that 
is. variables that moderate the relationship 
between available learning opportunities and 
student utcomes. However, when education is 
vle';vcu 3S an iterative process with current out- 
comes Iniluencing future achievement, then these 
concepts can be considered outcomes in the 
sense that the development of effort and involve- 
ment will be an outcome that in turn influences 
the educational process. For this reason, we are 
including these as possible outcome measures at 
the college level. 

Academic-Behavioral Outcomes 

Behavioral measures of students* cognitive 
development are difficult to find in the literature. 
Most measures used in research and practice are 
either motivational or cognitive. As mentioned 
earlier, involvement and quality of effort measures 
can be considered behavioral in the sense that 
they measure seff- reports of participation in 
various activities. Other behavioral measures of 
academic outcomes that NCRIPTAL researchers 
are considering include goal-exploring behaviors, 
relationships with faculty, and persistence. 

Career and Life Gocd Exploration 

One desired outcome of college is the explo- 
ration and development of Itfe options, including 
both appropriate career choices and recognition of 
values to be gained from liberal education. Inter- 
estingly, however, literature in these areas seems 
to advocate one of these types of student develop- 
ment to the exclusion of the other. 

A wide variety of instruments exists to 
measure students* career exploration activities. 
These include vocational development inventories, 
career maturity scales, and search procedures 
that help students identify occupational groups 
vnth similar personality characteristics or inter- 
ests. Most of these instruments are designed to 
assist the "undecided college student" and appear 
to be based on the assumption that being unde- 
cided is both economically inefflcient and psycho- 
logically unsettling to the college student. 

In a different mode, considerable rhetoric ad- 
vances the value of liberal education in preference 
to early (or premature) df:cisions about career 
specialization. Although this opermess to liberal 
learning is highly valued l)y many educators, the 
authors know of no Instrument designed to 
measure such studeiit piaxlivitles. Most existing 
information is based on surveys of entering 
college students. In rf^^ent years the percentage 
of students espousing vac ational goals has risen 
substantially while the percent who desire general 
education remains relaUvely stable. 
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From the standpoint of improving teaching 
and learning, the important question is whether 
student openness to considering various educa- 
tional and career alternatives changes because of 
specific educational experiences. For example, 
does study of liberal arts subjects result in stu- 
dents* placing greater or less value on this knowl- 
edge? Does studying career-oriented subjects 
close one's mind to the value of liberal education? 

Such questions have been studied by only a 
few researchers. To illustrate. Mentkowskl and 
Doherty (1984) report that students at a college 
with a competency-based hberal arts curriculum 
moved from strong initial career ortentatlons 
toward an appreciation of liberal arts. 

Instmments that measure student goals in a 
multi-dimensional fashion so that change in 
either direction can be assessed await develop- 
ment. 

Exploration of Diversity 

We have used the temi "exploration of 
diversity" to represent a complex set of educa- 
tional outcomes that are not captured in other 
cat^ortes of our framework. One of the best 
known frameworks for measuring such outcomes 
in the academlc-cpgnitlve sense Is Perry's scheme 
of intellectual development (Perry, 1970). In this 
scheme, intellectual development is measured by 
students* movemeiAt from a position of dualism 
(right/wrong) to a more balanced consideration of 
a variety of viewpoints (relativism), finally to 
selecting and Justifying one's own point of view 
(commitment). A variety of paper and pencil 
measures of stuient change on the Perry dimen- 
sions are under development. 

In the academic-behavioral sense, explora- 
tion of diversity may be reasonably well captured 
by some portions of Pace's Quality of Student Ex- 
perience Scale, discussed earlier. Attendance at 
campus events, for example, might be a measure 
of the Impact of college in broadening student 
horizons as well as an index of stuuent effort. 

Our intent in If^itlng these exploratoiy behav- 
iors as outcomes to stimulate thinking about 
broad behavioral observations through which 
colleges might measure the extent to which 
students become more likely to participate in 
further education and cultural affairs and to 
exhibit other behaviors generalty attributed to 
educated people. One of the deficiencies in 
previous outcome typologies has been strong 
dependence on high inference measures and a 
notable Jack of actual behavior observations in 
assessing student outcomes. We will be exploring 
measures of these types of outcomf^s in the 
future. 



Persistence 

Why students drop out of college hr.s been 
studied by numerous researchers in an effort to 
understand the determinants of attrition. Tlnto's 
(1975) model of attrition posits that academic and 
social integration of the student into the institu- 
tion and the students* interaction with these 
systems are the primary determinants of persis- 
tence in college. 

In testing the Tinto model. Munro (1981) 
found that academic integration had a strong 
effect on persistence while social integration was 
not a significant predictor. Similarly. Pascarella. 
Duty, and Iverson (1983) also found that aca- 
demic integration was a significant predictor of 
student persistence. In addlUon. tliey found that 
entry characteristics were more predictive of per- 
sistence in a non-residential settin^j than in a 
residential setting. 

Pascarella. Smart, and Ethington (1986) 
studied persistence in students at two-year 
colleges over a period of nine years. They found 
that both academic and social integration were 
important predictors of persistence when the 
students were tracked for a longer period of time, 

Edwards* and Waters (1983) attempted to 
explain persistence by usiiig academic course in- 
volvement, academic ability, academic perfomi- 
ance. and satisfaction with both courses and 
college in general as predictors. In a replication 
study, when they Included a personal needs/ 
college climate discrepancy index and a volun- 
tary/involuntary attrition breakdown, they found 
that the discrepancy index was marginally signifi- 
cant as a predictor of voluntary attrition. 

As an outcome variable, persistence thus 
has most often been correlated with various inde- 
pendent variables in hopes of identifying facilitat- 
ing conditions. While it Is not a direct measure of 
improved learning at the college level, attendance 
is a prerequisite for continued cognitive develop- 
ment that can be directly Unked to the college 
experience. Quite possibly, persistence as a 
dichotomous variable is not as useful an outcome 
in achieving teaching and learning improvement 
as involvement or quality of effort, which could be 
construed to represent various levels of persis- 
tence. 

Faculty-Student Relationships 

An additional behavioral outcome to be con- 
sidered here, student interactions with faculty, 
can be considered both an outcome and process 
variable. Informal interactions with faculty are 
included in Tinto's model of attrition as an in.por- 
tant aspect uf academic integration. In this 
sense, relationships are considered process 
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variables In the educational cycle; they determine 
and affect educational outcomes. However, when 
vfpuHntf ^Hiica<"<Qn -a*-* uoi-of4tr«* 

..*rr— ^ w^*«iN^»*vw** M** ^IVA^^^OO, CCiiit^ilL 

relationships with faculty can be viewed as out- 
comes that may affect future educational out- 
comes. 

Pascarella (1980) completed a c mprehensive 
review on the relationship between informal 
student-faculty interaction and college outcomes. 
He concluded that the extent and quality of 
student-facultj' interactions had significant 
positive assoclaUons v^th students* educational 
aspirations, their attitudes toward college, their 
academic achievement, their intellectual and 
personal development, and their persistence in 
college. Bean and Kuh (1984). on the other hand, 
found no slgnlflcant relationship between infor- 
mal contact with faculty and student grade point 
average. 

A recent study by Volkwein. King, and 
Terenzinl (1986) investigated the relationships 
that develop between faculty members and trans- 
fer students. In this subsample cdf college stu- 
dents, perceptions about the quality and strength 
of their relationships with faculty were signifi- 
cantly related to intellectual growth. 

Thus far. most measures of faculty-student 
relationships have been student self-reports of 
the number of hours per semester of non-class- 
related interactions with faculty. There is room 



for development of other valid measures of this 
association. 

In sum, there are several academic- behav- 
ioral outcomes of college but few measures hav^e 
been developed. Though these outcomes have 
been investigated less frequently than the cogni- 
tive and affective outcomes, there is evidence to 
support the importance and salience of these 
outcomer for college students. 

Conclusion 

In this v;orklng paper. NCRIPTAL has speci- 
fied its concerns for the technical parameters of 
outcome measures that are part of the current 
discussions of assessment, assuming that the 
policy parameters are hcl ! constant through our 
collaborative work with institutions that desire to 
improve teaching and learning. Further, we have 
delineated the type and form of outcomes through 
which we hope to measure the effectiveness of 
various alterations in teaching and learning 
environments. We have briefly reviewed &Dme of 
the forms of outcome measurement that r^re 
available for our use and that of others, and we 
have identified some gaps in available measure- 
ment techniques. As we learn more about new 
instruments and techniques that are being used 
with apparent success in purposive improvement 
of teaching and learning, we will expand the 
information in this paper. 
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Appendix A. Categories of the NCHEMS Outcome Structures 



Categories of the NCHEMS Outcomes Structure 



CAT. 

CODE # ENTITY BEING MAD>fTAlNED OK CHANCH) 

1000 Economic Outcomes 

HOG Economic Access and Indcp>cndcncc Outconies 
1110 Economic Access 

1120 Economic Flexibility, Adaptability, and Security 
1130 Income and Sundard of Living 

1200 Economic Resources and Costs 

1210 Economic Costs and Efficiency 

1220 Economic Resources (including employees) 

1300 Economic Production 

1310 Economic Productivity and Produccon 
1320 Economic Services Provided 

1400 Other Economic Outcomes 

2000 Human Characteristics Outcomes 

2100 Aspirations 

2110 Desires, Aims, and Goals 
2120 Dislikes, Likes, and Interests 
2130 Motivation or Drive Level 
2140 Other Aspirarional Outcomes 



2200 Competence and Skills 

2210 Academic SkilU 

2220 Cidxenship and Family Membership Skills 

2230 Creativity Skills 

2240 Expression and Communication Skills 

2250 Intellectual Skills 

2260 Interpersonal, Leadership, and Organizational Skills 

2270 Occupational and Employability Skills 

2280 Physical and Motor Skills 

2290 Other Skill Outcomes 



NcmE. The fbuT^'ievel catetonet, into which -'^v of the cjtejones lifted here can be div'wkd, 
are maintenance'* (a fourth dipt erf **n aru3 •'change" (a fourth digit of *'2*1 

Source. Oscar T Lenrung, Young S Lee, Sidr^ S. Micck, and Allan L Service, A S^ructurt 
for thi OwiCDme of Posticcondary Fdur/inm (Boulder, Colo Nanonal Center for Higher 
Educanon Management Svsccttu, 1977), p 27 
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Categories of the NCHEMS Outcomes Structure, conunned 

Cat. 

CODE 0 ENTITY BEING MAINTAINED OR CHANCtD 

2000 Human Characteristics Outcomes, continued 

2300 Morale, Satisfaction, and Affecrive Charaaenstics 
2310 Attitudes and Values 

2320 Beliefs, Commitments, and Philosophy of Life 
233C Feelings and Emotions 
2340 Mores, Customs, and Standards of Condua 
2350 Other Affective Outcomes 

2400 Perceptual Charaaenstics 

2410 Perceptual Awareness and Sensitivity 

2420 Percepnon of Self 

2430 Percepnon of Others 

2440 Perception of Things 

2450 Other Perceptual Outcomes 

2500 Personality and Personal Coping Charaaenstics 
2510 Adventurousness and Ininanve 
2520 Autonomy and IndeF>endence 
2530 Dependability and Responsibility^ 
2540 Dogmatic/Open-Minded, Authontarian/Dcmocratic 
2550 Flexibility and Adaptability 
2560 Habits 

2570 Psychological Functioning 
2580 Tolerance and Persistence 
2590 Other Personality and Personal Coping Outcomes 

2600 Physical and Physiological Characteristics 
2610 Physical Rtncss and Traits 
2620 Physiological Health 
2630 Other Physical or Physiological Outcomes 

2700 Status, Recognition, and Certification 

2710 Complenon or Achievement Award 
2720 Credit Recognition 
2730 Image, Reputation, or Status 
2740 Licensing and Certification 

2750 Obtaining a Job or Admission to a Follou*up Program 
2760 Power and/or Authority 



Rir 



28 



33 



Focusing on Student Academic Outcomes 



Appendix A (continued) 



Categories of the NCHEMS Outcomes Structure, continued 

CAT. 

CODE » ENimr SEINC MALVTAINEO OR CHANGED 



2000 Human Characteristics Outcomes, conunued 

2770 Job, School, or Lfe Success 
2780 Oiher Status, Recognition, and Certification 
Outcomes 

2800 Social Activities and Roles 

2310 Adjustment ro Retirement 

2820 Affiliations 

2830 Avociational and Social Activities and Roles 

2840 Career and Vocational Activities and Roles 

2850 Citizenship Activities and Roles 

2860 Family Activities and Roles 

2870 Friendships and Relationships 

2880 Other Activity and Role Outcomes 

2900 Other Human Characteristic Outcomes 

3000 Knowledge, Technology, and Art Form Outcomes 
3100 General Knowledge and Understanding 

3110 Knowledge and Undersunding of General Facts and 
Terminology 

3120 Knowledge and Understanding of General Processes 
3130 Knowledge and Undersunding of General Theory 
3140 Other General Knowledge and Understanding 

3200 Specialized Knowledge and Unc ^standing 

3210 Knowledge and Understanding of Speaalized Facts 

and Terminology 
3220 Knowledge and Understanding of Speaalized Processes 
3230 Knowledge and Understanding of Specialized Theory 
3240 Other Specialized Knowledge and Understanding 

3300 Research and Scholarship 

3310 Research and Scholarship Knowledge and 

Understanding 
33ZC Research and Scholarship Products 
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Appendix A (continued) 



Categories of the NCHEMS Outcomes Structure, continued 



C\T. 

CODE* EVHTY BEINC MALNTALNED OR CHANCED 

3000 Knowledge, Technology, and Art Form Outcomes, 
amtirmed 

3400 Art Forms and Works 
3410 Architecture 
3420 Dance 

3430 C>ebatc and Oratory 
3440 Drama 

3450 Literature and Wrinng 
3460 Music 

3470 Painting, Drawing, and Photography 
3480 Sculpture 
3490 Other Fine Arts 

3500 Other Knowledge, Technology, and An Form Outcomes 

4000 Resource and Service Provision Outcomes 

4100 Provision of Facilities and Events 
4110 Provision of Facilities 
4120 ProNTSion of Sponsorship of Events 

4200 Provision of Direct Services 
4210 Teaching 

4220 Advisory and Aru^Tnc Assistance 
4230 Treatment, Care, and Referral Services 
4240 Provision of Other Services 

4300 Other Resource and Service Provision Outcomes 

5000 Other Maintenance and Change Outcomes 

5100 Aesthetic-Cultural Activitie, Traditions, and Condinons 
5200 Organiraaonal Format, Activity, and Operation 
5300 Other Maintenance and Change 
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Appendix B. Basic Skills Tests for College Students 



Nelson-Denny Reading Test, Forms E and F 

Purpose: To evaluate reading comprehension, vocabulaiy development, and 
reading rate. 

Use: To screen, predict college success, and to diagnose reading difficulties. 

ReH^ibllity: Test-retest reliabilities for vocabulary subtest is .89 to .95. for 

comprehension subtest is .75 to .82. for reading rate is .62 to.82. 

Validity: Limited Information available; comprehension subtest is context 
dependent. 

Degrees of Reading Power 

Purpose: To measure reading effectiveness. 

Use: To predict probability of success for students in prose materials of 

varying difficulties. 

Reliability: K-R 20 coefficients vai^^ between .93 and .97. 

Validity: Correlations with CAT reading test range from .77 to .85. 

Prescriptive Reading Performance Test 

Purpose: To evaluate reading and spelling patterns and determine how a student 
employs visual and auditoiy modalities in reading. 

Use: To assess preliminarily reading level and reading comprehension; to 

identify strengths an*^ weaknesses. 

ReUability: Test-retest is .98. Spoarman-Brown corrected split-half is .98. 

Validity: Pearson correlations with six other reading measures range from 65 to 
.94. 

Self-scoring Reading Placement Test 

Purpose: To evaluate reading and mathematical skills necessary for success in a 
two-year college for students entering post secondary institutions with 
open-door policies. 

Use: To assist In placing students in college courses. 

ReliabUity: K- :^ 20 coefficients available in manual. 

Validity: Predictive validity of En^^l^sh and mathematics grades range from C6 
to. 70. with a median of .40. 

Q 31 
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Appendix B (continued) 
Test of Mathematical Abilities 

Purpose: To assess mathematical attitudes and aptitudes 
Use: For individual assessment. 

Reliability: Internal consistency reliabilities range from .96 to .57. 

Validity: Correlations of .26 to .31 betu^een attitude subscale scores and three 
standardized mathematics tests. 

AH Vocabulary Scale 

Purpose: To evaluate vocabulary level. 

Reliability: K-R 21 coefllcients range from .60 to .90, split-half coefficients range 
from .70 to .90 for all 80-word tests. 

Validity: Correlates between .50 and .75 with vocabulary tests and variables such 
as non-verbal, reading, and mathematics ability and intelligence. 

Comprehensive Tests of Basic Skills: Reading 

Purpose: To evaluate reading and reference skills. 

Reliability: K-R 20 coeflicients for total reading scores are .94 to .97, for reference 
skills are .76 to .94. 

Validity: Information not available. 

Iowa Silent Reading Tests 

Purpose: To measure vocabulary, reading comprehension, and reading efficiency. 

Reliability: Median aitemate-forms reliabilities for vocabulary, comprehension, and 
efficiency are .86, .83, and .77, respectively. 

Validity: No predictive validity information is available, correlations with other 
reading tests are in the .70s and .80s. 

Reading Progress Scale 

Purpose: To evaluate "reading- input" performance. 
Reliability: Alternate-form estimate is .84. 
Validity: Only available for grades 3-6. 
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Appendix B (continued) 

Sequential Teste of Educational Progress: Reading Series n 

Purpose: To measure sentence and passage comprehension. 

Reliability: Internal cuusistency and alternate forms reliability information are 
available in manual. 

College Board Achievement Test in Biathematics, Level 1 

Purpose: To measure mathematical abilities of persons with at least three years of 
college preparatory mathematics courses. 

Reliability: K-R 20 coefficient is .88. 

Validity: Good predictor of college grades but does not add much predictive ability 
when SAT scores and high school GPA are already included. 

♦Information for this appendix was gathered from reviews in the Mental Measurement Yearbook 
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